最近,车辆相似性学习,也称为重新识别(REID),引起了计算机视觉的极大关注。已经开发了几种算法并获得了相当大的成功。但是,由于可见性差,大多数现有方法在朦胧的情况下具有不愉快的性能。尽管有些策略可以解决此问题,但由于现实情况下的性能有限,缺乏现实世界中的清晰地面真理,它们仍然可以改善空间。因此,为了解决此问题的灵感,我们构建了一个称为\ textbf {rvsl}的训练范式,该范围集成了REID和域转换技术。该网络接受了半监督时尚的培训,不需要采用ID标签和相应的清晰的地面真相来学习真实世界中的险恶车辆的雷德任务。为了进一步限制无监督的学习过程,开发了几种损失。关于合成和现实世界数据集的实验结果表明,所提出的方法可以在朦胧的车辆REID问题上实现最新性能。值得一提的是,尽管提出的方法是在没有现实世界标签信息的情况下接受培训的,但与在完整标签信息中培训的现有监督方法相比,它可以实现竞争性能。
translated by 谷歌翻译
从阴雨的场景获得的图像通常患有能见度不良,这会损坏计算机视觉应用的性能。雨季的情况可以分为两类:中雨和大雨的场景。中雨场景主要由雨纹,而大雨场景包含两个雨条纹和面纱的效果(类似浊度)。虽然现有的方法已经在这两种情况下取​​得了优异的业绩单独,它仍然缺乏一个总体架构解决这两个大雨和中雨的情况有效。在本文中,我们通过使用轮廓波变换(CT),以解决这两个中雨和大雨场景构造分层多方向表示网络。的CT将图像划分成多个子带方向(MS)和语义子带(SS)。首先,雨条纹信息被检索到基于CT的多取向性的MS。第二,分层结构,提出了重构的背景信息,包括受损的语义信息和SS面纱效果。最后,多层次的子带鉴别与反馈误差地图建议。通过这个模块,所有子带可以很好的优化。这是可以有效地解决这两个两种方案的第一个架构。该代码是在https://github.com/cctakaet/ContourletNet-BMVC2021可用。
translated by 谷歌翻译
This paper expounds the design and control of a new Variable Stiffness Series Elastic Actuator (VSSEA). It is established by employing a modular mechanical design approach that allows us to effectively optimise the stiffness modulation characteristics and power density of the actuator. The proposed VSSEA possesses the following features: i) no limitation in the work-range of output link, ii) a wide range of stiffness modulation (~20Nm/rad to ~1KNm/rad), iii) low-energy-cost stiffness modulation at equilibrium and non-equilibrium positions, iv) compact design and high torque density (~36Nm/kg), and v) high-speed stiffness modulation (~3000Nm/rad/s). Such features can help boost the safety and performance of many advanced robotic systems, e.g., a cobot that physically interacts with unstructured environments and an exoskeleton that provides physical assistance to human users. These features can also enable us to utilise variable stiffness property to attain various regulation and trajectory tracking control tasks only by employing conventional controllers, eliminating the need for synthesising complex motion control systems in compliant actuation. To this end, it is experimentally demonstrated that the proposed VSSEA is capable of precisely tracking desired position and force control references through the use of conventional Proportional-Integral-Derivative (PID) controllers.
translated by 谷歌翻译
Object instance segmentation is a key challenge for indoor robots navigating cluttered environments with many small objects. Limitations in 3D sensing capabilities often make it difficult to detect every possible object. While deep learning approaches may be effective for this problem, manually annotating 3D data for supervised learning is time-consuming. In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments. Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. SupeRGB-D outperforms existing baselines on unseen objects while achieving similar performance on seen objects. Additionally, it is extremely lightweight (0.4 MB memory requirement) and suitable for mobile and robotic applications. The dataset split and code will be made publicly available upon acceptance.
translated by 谷歌翻译
A self-supervised adaptive low-light video enhancement (SALVE) method is proposed in this work. SALVE first conducts an effective Retinex-based low-light image enhancement on a few key frames of an input low-light video. Next, it learns mappings from the low- to enhanced-light frames via Ridge regression. Finally, it uses these mappings to enhance the remaining frames in the input video. SALVE is a hybrid method that combines components from a traditional Retinex-based image enhancement method and a learning-based method. The former component leads to a robust solution which is easily adaptive to new real-world environments. The latter component offers a fast, computationally inexpensive and temporally consistent solution. We conduct extensive experiments to show the superior performance of SALVE. Our user study shows that 87% of participants prefer SALVE over prior work.
translated by 谷歌翻译
Hyperparameter tuning is critical to the success of federated learning applications. Unfortunately, appropriately selecting hyperparameters is challenging in federated networks. Issues of scale, privacy, and heterogeneity introduce noise in the tuning process and make it difficult to evaluate the performance of various hyperparameters. In this work, we perform the first systematic study on the effect of noisy evaluation in federated hyperparameter tuning. We first identify and rigorously explore key sources of noise, including client subsampling, data and systems heterogeneity, and data privacy. Surprisingly, our results indicate that even small amounts of noise can significantly impact tuning methods-reducing the performance of state-of-the-art approaches to that of naive baselines. To address noisy evaluation in such scenarios, we propose a simple and effective approach that leverages public proxy data to boost the evaluation signal. Our work establishes general challenges, baselines, and best practices for future work in federated hyperparameter tuning.
translated by 谷歌翻译
We present a simple approach which can turn a ViT encoder into an efficient video model, which can seamlessly work with both image and video inputs. By sparsely sampling the inputs, the model is able to do training and inference from both inputs. The model is easily scalable and can be adapted to large-scale pre-trained ViTs without requiring full finetuning. The model achieves SOTA results and the code will be open-sourced.
translated by 谷歌翻译
We propose a novel hematoxylin and eosin (H&E) stain normalization method based on a modified U-Net neural network architecture. Unlike previous deep-learning methods that were often based on generative adversarial networks (GANs), we take a teacher-student approach and use paired datasets generated by a trained CycleGAN to train a U-Net to perform the stain normalization task. Through experiments, we compared our method to two recent competing methods, CycleGAN and StainNet, a lightweight approach also based on the teacher-student model. We found that our method is faster and can process larger images with better quality compared to CycleGAN. We also compared to StainNet and found that our method delivered quantitatively and qualitatively better results.
translated by 谷歌翻译
The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address this limitation, we propose a novel PyNET-V2 Mobile CNN architecture designed specifically for edge devices, being able to process RAW 12MP photos directly on mobile phones under 1.5 second and producing high perceptual photo quality. To train and to evaluate the performance of the proposed solution, we use the real-world Fujifilm UltraISP dataset consisting on thousands of RAW-RGB image pairs captured with a professional medium-format 102MP Fujifilm camera and a popular Sony mobile camera sensor. The results demonstrate that the PyNET-V2 Mobile model can substantially surpass the quality of tradition ISP pipelines, while outperforming the previously introduced neural network-based solutions designed for fast image processing. Furthermore, we show that the proposed architecture is also compatible with the latest mobile AI accelerators such as NPUs or APUs that can be used to further reduce the latency of the model to as little as 0.5 second. The dataset, code and pre-trained models used in this paper are available on the project website: https://github.com/gmalivenko/PyNET-v2
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译